Skip-Thought Vectors

نویسندگان

Ryan Kiros

Yukun Zhu

Ruslan Salakhutdinov

Richard S. Zemel

Raquel Urtasun

Antonio Torralba

Sanja Fidler

چکیده

We describe an approach for unsupervised learning of a generic, distributed sentence encoder. Using the continuity of text from books, we train an encoderdecoder model that tries to reconstruct the surrounding sentences of an encoded passage. Sentences that share semantic and syntactic properties are thus mapped to similar vector representations. We next introduce a simple vocabulary expansion method to encode words that were not seen as part of training, allowing us to expand our vocabulary to a million words. After training our model, we extract and evaluate our vectors with linear models on 8 tasks: semantic relatedness, paraphrase detection, image-sentence ranking, question-type classification and 4 benchmark sentiment and subjectivity datasets. The end result is an off-the-shelf encoder that can produce highly generic sentence representations that are robust and perform well in practice. We will make our encoder publicly available.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Skip-Prop: Representing Sentences with One Vector Per Proposition

We introduce the notion of a multi-vector sentence representation based on a “one vector per proposition” philosophy, which we term skip-prop vectors. By representing each predicate-argument structure in a complex sentence as an individual vector, skip-prop is (1) a response to empirical evidence that single-vector sentence representations degrade with sentence length, and (2) a representation ...

متن کامل

Skip-Thought Memory Networks

Question Answering (QA) is fundamental to natural language processing in that most nlp problems can be phrased as QA (Kumar et al., 2015). Current weakly supervised memory network models that have been proposed so far struggle at answering questions that involve relations among multiple entities (such as facebook’s bAbi qa5-three-arg-relations in (Weston et al., 2015)). To address this problem ...

متن کامل

Trimming and Improving Skip-thought Vectors

The skip-thought model has been proven to be effective at learning sentence representations and capturing sentence semantics. In this paper, we propose a suite of techniques to trim and improve it. First, we validate a hypothesis that, given a current sentence, inferring the previous and inferring the next sentence provide similar supervision power, therefore only one decoder for predicting the...

متن کامل

Rethinking Skip-thought: A Neighborhood based Approach

We study the skip-thought model proposed by Kiros et al. (2015) with neighborhood information as weak supervision. More specifically, we propose a skip-thought neighbor model to consider the adjacent sentences as a neighborhood. We train our skip-thought neighbor model on a large corpus with continuous sentences, and then evaluate the trained model on 7 tasks, which include semantic relatedness...

متن کامل

PolyUCOMP: Combining Semantic Vectors with Skip bigrams for Semantic Textual Similarity

This paper presents the work of the Hong Kong Polytechnic University (PolyUCOMP) team which has participated in the Semantic Textual Similarity task of SemEval-2012. The PolyUCOMP system combines semantic vectors with skip bigrams to determine sentence similarity. The semantic vector is used to compute similarities between sentence pairs using the lexical database WordNet and the Wikipedia corp...

متن کامل

Representing Sentences as Low-Rank Subspaces

Sentences are important semantic units of natural language. A generic, distributional representation of sentences that can capture the latent semantics is beneficial to multiple downstream applications. We observe a simple geometry of sentences – the word representations of a given sentence (on average 10.23 words in all SemEval datasets with a standard deviation 4.84) roughly lie in a low-rank...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2015

Skip-Thought Vectors

نویسندگان

چکیده

منابع مشابه

Skip-Prop: Representing Sentences with One Vector Per Proposition

Skip-Thought Memory Networks

Trimming and Improving Skip-thought Vectors

Rethinking Skip-thought: A Neighborhood based Approach

PolyUCOMP: Combining Semantic Vectors with Skip bigrams for Semantic Textual Similarity

Representing Sentences as Low-Rank Subspaces

عنوان ژورنال:

اشتراک گذاری